Assessment of substitution model adequacy using frequentist and Bayesian methods.
نویسندگان
چکیده
In order to have confidence in model-based phylogenetic methods, such as maximum likelihood (ML) and Bayesian analyses, one must use an appropriate model of molecular evolution identified using statistically rigorous criteria. Although model selection methods such as the likelihood ratio test and Akaike information criterion are widely used in the phylogenetic literature, model selection methods lack the ability to reject all models if they provide an inadequate fit to the data. There are two methods, however, that assess absolute model adequacy, the frequentist Goldman-Cox (GC) test and Bayesian posterior predictive simulations (PPSs), which are commonly used in conjunction with the multinomial log likelihood test statistic. In this study, we use empirical and simulated data to evaluate the adequacy of common substitution models using both frequentist and Bayesian methods and compare the results with those obtained with model selection methods. In addition, we investigate the relationship between model adequacy and performance in ML and Bayesian analyses in terms of topology, branch lengths, and bipartition support. We show that tests of model adequacy based on the multinomial likelihood often fail to reject simple substitution models, especially when the models incorporate among-site rate variation (ASRV), and normally fail to reject less complex models than those chosen by model selection methods. In addition, we find that PPSs often fail to reject simpler models than the GC test. Use of the simplest substitution models not rejected based on fit normally results in similar but divergent estimates of tree topology and branch lengths. In addition, use of the simplest adequate substitution models can affect estimates of bipartition support, although these differences are often small with the largest differences confined to poorly supported nodes. We also find that alternative assumptions about ASRV can affect tree topology, tree length, and bipartition support. Our results suggest that using the simplest substitution models not rejected based on fit may be a valid alternative to implementing more complex models identified by model selection methods. However, all common substitution models may fail to recover the correct topology and assign appropriate bipartition support if the true tree shape is difficult to estimate regardless of model adequacy.
منابع مشابه
Comparison of Bayesian and Frequentist Methods in Estimating the Net Reclassification and Integrated Discrimination Improvement Indices for Evaluation of Prediction Models: Tehran Lipid and Glucose Study
Introduction: The Frequency-based method is commonly used to estimate the Net Reclassification Improvement (NRI)- and Integrated Discrimination Improvement (IDI) indices. These indices measure the magnitude of the performance of statistical models when a new biomarker is added. This method has poor performance in some cases, especially in small samples. In this study, the performance of two Bay...
متن کاملA mixed Bayesian/Frequentist approach in sample size determination problem for clinical trials
In this paper we introduce a stochastic optimization method based ona mixed Bayesian/frequentist approach to a sample size determinationproblem in a clinical trial. The data are assumed to come from a nor-mal distribution for which both the mean and the variance are unknown.In contrast to the usual Bayesian decision theoretic methodology, whichassumes a single decision maker, our method recogni...
متن کاملA Decision between Bayesian and Frequentist Upper Limit in Analyzing Continuous Gravitational Waves
Given the sensitivity of current ground-based Gravitational Wave (GW) detectors, any continuous-wave signal we can realistically expect will be at a level or below the background noise. Hence, any data analysis of detector data will need to rely on statistical techniques to separate the signal from the noise. While with the current sensitivity of our detectors we do not expect to detect any tru...
متن کاملBayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data
This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...
متن کاملPuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy
SUMMARY The accuracy of Bayesian phylogenetic inference using molecular data depends on the use of proper models of sequence evolution. Although choosing the best model available from a pool of alternatives has become standard practice in statistical phylogenetics, assessment of the chosen model's adequacy is rare. Programs for Bayesian phylogenetic inference have recently begun to implement mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Molecular biology and evolution
دوره 27 12 شماره
صفحات -
تاریخ انتشار 2010